King Edwards VII BTEC Applied Science visit

Mark Dunning - The University of Sheffield

Introduction to Bioinformatics

Definition

an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex

Data Explosion

Genetic Code

A blueprint for life (an analogy)

A blueprint for life (an analogy)

A blueprint for life (an analogy)

Back to the the human genome

Why use computers

Demo time!

Data Management

Context

Context

Context

No matter how much of the analysis is automated, some manual steps are inevitably involved

“metadata”

Short Exercise

Rule 1

Rule 1 - Maintain consistency

Example 1

Patient ID Sex Date of Diagnosis Tumour Size
1 M 01-01-2013 3.1
2 f 04-18-1998 1.5
3 Male 1st of April 2004 105
4 Female NA 67
5 F 2010/03/12 4.2
6 F 3.6
7 M 1994-11-05T08:15:30-05:00 232

Example 1

Regarding dates

credit: @myusuf3

Example 1 - corrected

Patient ID Sex Date of Diagnosis Tumour Size
001 M 2013-01-01 3.1
002 F 1998-04-18 1.5
003 M 2004-04-01 1.05
004 F NA 0.67
005 F 2010-03-12 4.2
006 F NA 3.6
007 M 1994-11-05 2.32

Rule 2

Rule 2 - Never work directly on the raw data

Rule 2

Rule 3

Figure showing locations of visitors to my Prostate Cancer data portal

Rule 3 - Don’t use 0 to mean missing

Rule 4

Patient ID Date Value
1 2015-06-14 213
2 76.5
3 2015-06-18 32
4 120.3
5 109
6 2015-06-20
7 143

Rule 4

Fill in all the cells

Rule 4

Example 2 Corrected

Patient ID Date Value
1 2015-06-14 213
2 2015-06-14 76.5
3 2015-06-18 32
4 2015-06-18 120.3
5 2015-06-18 109
6 2015-06-20 NA
7 2015-06-20 143

Rule 5

Rule 5

Make it rectangle

Rule 5

More

More

Computer doesn’t recognize it!

Demo time